Collider Inclusive Jet Data and the Gluon Distribution 
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Inclusive jet production data are important for constraining the gluon distribution in the global 
QCD analysis of parton distribution functions. With the addition of recent CDF and DO Run II jet 
data, we study a number of issues that play a role in determining the up-to-date gluon distribution 
and its uncertainty, and produce a new set of parton distributions that make use of that data. We 
present in detail the general procedures used to study the compatibility between new data sets and 
the previous body of data used in a global fit. We introduce a new method in which the Hessian 
matrix for uncertainties is "rediagonalized" to obtain eigenvector sets that conveniently characterize 
the uncertainty of a particular observable. 



INTRODUCTION 



The gluon distribution g{x, /i) plays an important 
role in high energy collider phenomenology, both for 
standard model and new physics. Yet it is the most 
elusive of the parton distribution functions (PDFs) in 
contemporary global QCD analysis. At moderate val- 
ues of the momentum fraction x, extensive high preci- 
sion data on deep inelastic scattering (DIS) constrain 
g(x, ^) fairly well through the /i-dependence that is 
predicted by QCD. The little information we have at 
large x comes mostly from inclusive jet production 
at hadron colliders, which receives contributions di- 
rectly from the gluon distribution at leading order in 
Us- The recently published inclusive jet data from 
Tevatron Run II measurements by CDF [l| and DO 
are therefore of considerable interest for improving 
our knowledge of the gluon distribution. 

Previous CTEQ studies [1, [| have used only the 
Run I jet data 0, A recent MSTW study % in- 
cludes the Run II data in an analysis with aims par- 
allel to this one. A comparison with their results is 
presented in Sec. [S] 

In this paper, we make a detailed study of several 
issues that bear on the behavior of the gluon distribu- 
tion and its range of uncertainties, focusing on the use 
of Tevatron inclusive jet data. (Inclusive jet produc- 
tion in DIS processes can also provide constraints on 
the gluon distribution; but those constraints are con- 
siderably weaker and we do not include them here.) 
Some of the results and techniques described here are 
known to many practitioners in the field, but have not 
been previously documented in the literature. Some 
of these results are frequently misunderstood — e.g., in 
discussions at workshops — so it seems worthwhile to 
set them out in systematic detail. The methods dis- 



cussed here for the inclusive jet data thus serve as a 
pedagogical study of techniques that can be applied 
in general when new data sets become available to ad- 
vance the PDF analysis. 

One of the techniques we use is presented here for 
the first time. It involves orienting the choice of eigen- 
vector directions in the Hessian method in order to 
simplify the study of uncertainty for any particular 
quantity of interest. 



THEORY CALCULATIONS FOR 
INCLUSIVE JETS 



Up to now, the CTEQ global analyses of jet cross 
sections as a function of jet transverse momentum pT 
have been based on the EKS NLO program 0. Re- 
cently, the FastNLO implementation [l^ of the NLO- 
JET-f-l- [ll[ calculation has gained increasing use — in 
part because of its convenient interface. (FastNLO al- 
lows the dependence on the PDFs of the NLO cross 
section to be included in the computation of at ev- 
ery step within the fitting procedure. However, we find 
that calculating the ratio K=NLO/LO for each data 
point using a single typical fit to the data provides an 
adequate approximation.) To make sure that the two 
calculations are consistent in the global analysis con- 
text, we have directly compared their results in the 
Tevatron Run I and II kinematic ranges. The theo- 
retical results also depend on choices of: (i) the renor- 
malization and factorization scales in PQCD, usually 
taken to be the same, say /x; and (ii) the jet algorithm, 
including parameters such as i?sep (for separation of 
neighboring jets) We have performed the compar- 
ison using a variety of these choices. The results pro- 
vide information on the importance of these factors for 
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FIG. 1: Theory calculations for the ratio K = NLO/LO from FastNLO and EKS. FastNLO with fi = pT. -Rscp = 2.0 (long 
dash dot), J?sep = l-3 (short dash dot); FastNLO with h=pt/2: Rsep~2.0 (long dash), i?sep = 1.3 (short dash); EKS with 
fi = PT/2, i?scp = l-3 (solid). 



the global analysis. 

Figure [1] shows the K-factor, defined by K = 
NLO/LO, for the Tevatron Run II pT range in sev- 
eral of the experimental rapidity intervals. Each plot 
shows results from FastNLO for two choices of the 
scale: fi ~ px (upper two curves), and fi — pt/2 (lower 
two dashed curves). Within each of these pairs, the 
upper curve uses the midpoint cone jet algorithm with 
-Rsop = 2.0, while the lower curve uses the midpoint al- 



gorithm with _Rscp = 1-3. The solid curve shows the re- 
sult of the EKS program for fi = pt/2 and i?sop = 1-3. 
(The wiggles in this curve are caused by fluctuations 
from the Monte Carlo integration used in EKS.) We 
observe the following: 

• The overall agreement between the EKS and 
FastNLO calculations is satisfactory, though not per- 
fect. Our results from parallel global analyses based on 
these two methods for calculating jets, with all other 




FIG. 2: Effect of scale choice on predicted cross section with -Rsep = 1-3: fi = 2pT (short dash), pt (long dash), pt/2 
(solid), pt/4 (dotted), relative to our Standard Choice {fi = Pt/2, Rscp = 1.3, no "two-loop" correction). Right panels 
include the "two-loop" resummation correction. Uncertainty bands from PDFs are shown for comparison. 



options identical, show good agreement, which indi- 
cates that results of the global analysis arc not sen- 
sitive to deviations of the magnitude shown. We use 
the FastNLO results for the remainder of this investi- 
gation. 

• The effect of i?sGp choice is quite small. 

Since the scale choice affects the predicted cross sec- 
tion directly through the LO cross section, as well as 
through the K factor, we explore it further in Fig. 
[5J which shows the predicted cross section for vari- 



ous scale choices normalized by our "standard" choice. 
The plots on the left correspond to the conventional 
NLO calculation, while those on the right also include 
a "2-loop" correction derived from threshold resumma- 
tion which is available in FastNLO. We only show 
results from one central and one large rapidity bin; re- 
sults at intermediate y interpolate between these two. 
The bands in each plot represent the estimated uncer- 
tainty due to the PDFs, for comparison. We conclude 
the following: 
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• The low-scale choice ^ = /4 leads to results that 
are far from the other choices at large rapidity, and 
shows unstable behavior with respect to 2-loop correc- 
tions, which lie mostly outside the PDF uncertainty 
bands. This scale choice is thus unsuitable for theoret- 
ical calculations, as is also apparent from the fact that 
K = NLO/LO is far from 1 with this choice, which sug- 
gests that still higher order corrections would be very 
important. By contrast, the other three scale choices 
show consistent patterns and yield stable results. 

• One may use the range pt/2 < /i < 2pT to em- 
pirically estimate the uncertainty due to uncalculated 
higher-order corrections. This range of theoretical un- 
certainty is seen to be fairly independent of pr, in 
contrast to the uncertainty due to PDFs, which has a 
strong pj- dependence. The theory uncertainty is com- 
parable to the PDF uncertainty in the low pr range, 
but is much smaller than it in the high pT range. 

• The theoretical uncertainties are reduced in the cal- 
culation that includes the partial 2-loop correction. 
Whether this reduction provides a genuine increase in 
accuracy depends on the reliability of the approxima- 
tion, which is still controversial. Wc do not use this 
correction in the remainder of the paper. 

With these theoretical background studies com- 
pleted, we now proceed to study the impact of the 
Tevatron jet data on determining the gluon distribu- 
tion. 



3. PRELIMINARY GLOBAL FIT WITH THE 
NEW JET DATA 

We use the published CTEQ6.6 PDF set [1 as the 
reference fit for our comparison study. Unless other- 
wise stated, the theoretical and experimental inputs 
are kept the same as in [5| except for the addition of 
the CDF 3 and DO 0] Run II data sets. 

We use the CDF Run II results obtained from the 
midpoint cone jet algorithm, rather than the earlier re- 
sults based on the fcy algorithm Q . The two analyses 
were carried out on the same events, so it be incorrect 
to include them both; and the ratio of the resulting 
cross sections agrees well with the ratio predicted by 
NLO QCD, as stated in (The CDF data were sup- 
plied to us by one of the CDF authors, so we were 
not affected by errors in the original publication 
which have now been corrected as described in its first 
reference.) 

The CTEQ6.6 central fit and its eigenvector sets 
which characterize the uncertainty are known to de- 
scribe the Run II jet data fairly well, even though those 



data were not available at the time of the CTEQ6.6 
analysis. Thus from the outset we know that no rev- 
olutionary changes will result from incorporating the 
new data into the global analysis. The purpose of our 
study is to quantify what changes there are; and to in- 
vestigate some subtle features that have not been ex- 
plored before, which have implications for our efforts 
to pin down the gluon PDF. 

With the addition of the Tevatron Run II jet data, 
our global analysis includes 37 data sets with a total 
of 2898 data points. As a baseline, when CTEQ6.6 
PDFs are used directly to compute the cross sections 
and then compared to these data points, we obtain 
a good overall fit with = 2756. Here and in all 
our fits, the full correlated experimental errors are used 
in computing , for all experiments that report their 
errors in this form. To get a first look at the impact 
of the Run II jet data, we performed a preliminary fit 
using the same theoretical input as CTEQ6.6. In this 
fit, the weighted becomes 2740 — a reduction of 16. 
This is a very small reduction when spread over all 
2898 data points, or even when spread over the 182 
new ones — as was anticipated since CTEQ6.6 already 
provided a reasonably good fit to the new data. 

The only significant change in the best-fit PDFs 
from CTEQ6.6 to the preliminary fit occurs in the 
gluon PDF. This can be demonstrated by repeating 
the global fit with all of the quark distribution param- 
eters frozen at their CTEQ6.6 values, thus only allow- 
ing the gluon distribution to change. The reduction 
in is nearly the same and the resulting fit is essen- 
tially equivalent to the preliminary fit. This confirms 
our expectation that the inclusive jet data provide a 
handle on g(x,^) and little else. 



4. PARAMETRIZING THE GLUON 
DISTRIBUTION 

Since the jet data are sensitive to the gluon dis- 
tribution, it is essential to use a sufficiently flexible 
paramctrization for the gluon at the starting scale /.to 
for DGLAP evolution, which wc choose to be 1.3 GeV 
as in previous analyses. The form we use is 

g{x, ^o) = 0,0 x""^ (1 — x)"^ exp(a3a; + a4x'^ + a5\/x) . 

(1) 

We add a penalty to the overall to force parame- 
ter 02, which controls the behavior at x ^ 1, to lie 
within a reasonable but generous range 0.5 ^ 02 ^ 10. 
The form ([1]) is more general than what was used in 
CTEQ6.6, which was equivalent to 02 = 4 and 05 ~ 0. 
Alternative parametrizations have also been tested, to 
assure that our results are not sensitive to the particu- 
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lar form of smooth function that wc choose to multiply 
the basic x°'^ and (1 — a;)°^ factors. 

Because the gluon distribution is not strongly con- 
strained by existing data, it has been common to 
use a fairly restricted functional form for the non- 
perturbative input function, compared to the better- 
constrained light quark distributions. A frequent prac- 
tice is to start with the minimal form x"^ (1 — and 
incrementally add new parameters until the quality of 
the global fit ceases to improve. This is a sensible 
approach for finding a reasonable "best fit" PDF set. 
But it can produce misleading results by artificially 
reducing the estimated uncertainties — as happened fa- 
mously when the CDF Run I measurements of the jet 
cross section at first appeared to lie outside the range 
of standard model predictions at large px [l^, [3] • We 
will discuss related examples of this in Secs. lS.ll andlTl 

In current practice, the number of parameters used 
by various groups shows wide variation, which de- 
pends both on the constraining power of the input 
experiments included in the analysis and on the sta- 
bility of the analysis method. Using too few param- 
eters can lead to uncertainty estimates that reflect 
the assumed functional forms more than the experi- 
mental constraints. This fact appears to be under- 
appreciated, since the number of parameters used for 
uncertainty studies is commonly kept at a minimum 
level based only on the central fit. 

A technical reason to restrict the number of pa- 
rameters in the uncertainty study is the instability 
of the Hessian method for determining the extreme 
PDF eigenvector sets, which occurs as the number of 
fitting parameters approaches the limit of constrain- 
ing power of the experimental input. We have been 
able to overcome this problem by using the iterative 
method developed in [1^ to control both the instabili- 
ties due to vast disparities in the unsealed eigenvalues 
and instabilities caused by the numerical evaluation of 
the second derivatives that define the Hessian matrix. 
The set of tools we have developed provide an orderly 
way to obtain stable results as the number of param- 
eters is increased. This is the reason why the CTEQ 
analyses have consistently used a larger number of un- 
certainty eigenvector sets than other groups. The fits 
described in Sec. [7] use 24 parameters to describe the 
PDFs at Aio- 

A Neural Network approach to the PDF analysis 
(NNPDF) [3| has been developed recently to circum- 
vent the parametrization issue. This appears highly 
promising. However, to make this approach as effec- 
tive as possible, it may be important to retain some 
theory-based guidelines on the PDFs at scale /iq. In 
particular, there are good physical arguments behind 



the traditionally assumed behaviors x"-^ at x ^ and 
(1 — x)"^ at X ^ 1, which even predict estimates for 
the constants oi and 02 that one may wish to harness. 
The validity of those arguments is supported by the 
observation that for the u quark distribution, which 
is the most accurately measured of the PDFs, the fit- 
ted results for ai and 02 lie close to their theoretical 
expectations. 

Wc now proceed to a detailed study of the compati- 
bility of the jet data sets with each other and with the 
nonjet data. This study also serves as a case study 
of methods to apply when adding new data sets to a 
global fit. 



5. TESTING COMPATIBILITY OF DATA 
SETS USING WEIGHTED 

When one contemplates adding new sets of experi- 
mental data to an existing global analysis, one begins 
by asking a series of questions that can be answered 
systematically by making fits in which the for the 
new data sets are multiplied by various weight factors. 
These weight factors multiply the contributions from 
individual data sets before they arc added to the global 
that is minimized in the fit, in order to vary how 
much influence each set is allocated in determining the 
fit. For a related discussion of these ideas, see |17| . 

Are the new data consistent with theory? can be 
addressed in a minimal way by seeing if for the 
new data is acceptably close to its nominal range of 
N ± \/2 N for N data points, at least when these data 
are assigned a sufficiently large weight. (In the ideal 
situation of Gaussian experimental errors, this range 
corresponds to a la confidence interval around the 
best-fit x^- In the present case, where the bulk of 
the experimental error may come from systematic ef- 
fects, this comparison may also reveal deviations from 
Gaussian behavior, which are known to occur when the 
experimental errors arc predominantly systematic.) 

Are the new data consistent with the previous exper- 
iments? can be addressed by observing the increase 
in for the original data that occurs when the fit is 
adjusted to accommodate the new data. 

Are the new data sets consistent with each other? can 
be studied by observing the change in for each new 
data set in response to changing the weights for the 
other new data sets. This will reveal whether two new 
data sets "pull" in the same direction, or whether on 
the contrary there is a "tension" between them; or 
whether they measure different features, and so have 
little effect on each other. 
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Do the new data sets provide significant new con- 
straints ? can be studied in a simple way by exploring 
the range of acceptable fits to the original data using 
the Hessian (eigenvector) method, and observing how 
many of these eigenvector sets produce acceptable fits 
to the new data. 
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TABLE I: for jet experiments with various weights 

We will carry out these studies explicitly for the 
case of the four inclusive jet data sets from the Tcva- 
tron: CDF Run I, DO Run I, CDF Run 11, DO Run 
II. Of these, only the Run I sets were included in the 
CTEQ6-CTEQ6.6 analyses. It is well known that the 
Run I data had a substantial impact on the determi- 
nation of the gluon distribution at large x. It will be 
interesting to see whether the Run II data, which are 
based on a much larger integrated luminosity, provide 
significant new constraints. It will also be interesting 
to see whether the Run I data still play a significant 
role after the higher-statistics Run II data have been 
included. It will further be interesting to see whether 
the data from Run II pull the fit in the same directions 
as Run I, or if there is tension between the implica- 
tions of the old and new data sets. We can similarly 
ask about possible tension between the CDF and DO 
data sets. These are questions that have been raised 
at a number of workshops, but they have not been 
approached with the methods we describe here. 

The information needed to answer these questions 
is contained in Table HJ which shows for each of the 



4 jet experiments obtained by minimizing the total 
weighted under a variety of choices for the weights 
assigned to those experiments. The weighted for 
the sum of all nonjet experiments is shown in the last 
column, with the no-jets best fit value subtracted for 
convenience. 

The question of whether the jet data sets agree with 
theory according to the "hypothesis testing" criterion 
is answered by seeing whether the for these sets lie 
within the expected statistical range N ± ^/2N, where 
N is the number of data points in the experiment. 

1. For CDFi, the expected range is 25-41. The fit 
with all jet weights 1 lies a little outside that 
range. This appears to result from unusually 
large fluctuations in a couple of the data points: 
these data cannot be fitted at much better 
using any plausible smooth function, as is ev- 
idenced by the fact that drops to only 47 
when a weight of 50 is assigned to this experi- 
ment. (The purpose of including fits with a such 
a large weight in Table |T] is exactly to obtain this 
kind of information.) Unlike the other jet exper- 
iments, CDFi has data only in the central ra- 
pidity region. It is therefore less sensitive to the 
gluon distribution than the others — in spite of its 
historic importance in changing the view of the 
gluon at large x\ The range of x^ for this exper- 
iment over the entire series of fits shown in the 
table is quite small, and it therefore has rather 
little influence on the contemporary global fit. 

2. For DOi, the expected range is 77-103. The best- 
fit x^ in the fit with all jet weights equal to 1 
is 59, dropping to 32 at weight 10 for this ex- 
periment. If only the Run I data with weight 
1 are included, we obtain = 47 with the 
new gluon parametrization, 68 for CTEQ6.5M, 
99 for CTEQ6.6M, 124 and 138 for restricted 
gluon parametrizations shown in Tables 2 and 
3. Thus this experiment is certainly consistent 
with the theory, although the unexpectedly large 
range of variations in (despite the similarity 
in the explored PDF parametrizations) is sug- 
gestive of pronounced non-Gaussian behavior of 
the systematic errors for this data set. The fact 
that fits to these data can be obtained with with 
X^ so much smaller than the number of points 
also suggests that there is something peculiar 
about the errors. (The correlated systematic er- 
rors for this experiment were published only as a 
single covariance matrix, rather than being bro- 
ken out as individual shifts associated with each 
specific source of systematic error, whose magni- 
tudes can be directly examined for plausibility. 
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Systematic errors given in this form could nev- 
ertheless be analyzed using principal component 
analysis [l^, but we have not yet carried this 
out.) 

3. For CDFii, the expected range is 60-84. The fit 
gives = 88 with all jet weights equal to 1, 
dropping to 75 for weight 10; which implies that 
these data are consistent with theory. 

4. For DOii, the expected range is 95-125. The fit 
gives = 121 with all jet weights equal to 1, 
dropping to 116 for weight 10; again eminently 
consistent with theory. 

The question of whether the jet data sets are con- 
sistent with the rest of the data in the global analysis 
can be addressed by observing the increase in for 
the nonjet data that occurs when the fit is adjusted 
to accommodate the jet data. Table |T] shows that x^ 
for the nonjet data is forced to increase by only 9.6 to 
accommodate the 4 jet experiments at weight 1, and 
only by 39.6 to accommodate them at weight 10. In 
our previous studies of these data for CTEQ6.6, we 
estimated that an increase of Ax^ — 100 could be 
tolerated at the 90% confidence level, so the jet exper- 
iments appear consistent with the nonjet data. Note 
that we take the "hypothesis testing" point of view of 
requiring that x^/N be acceptable for all of the ex- 
periments, rather than the more stringent "parameter 
fitting" (Ax^ = l)_point of view for estimating the 
uncertainty limits |17| . 

The question of whether the four inclusive jet ex- 
periments are consistent with each other in the fit can 
be studied by looking at how increasing the weight for 
some of them affects the x^ for the others. From Ta- 
ble [H we observe the following: 

• The two Run II experiments are fairly consistent with 
each other, since for example when CDFn is assigned 
weight 10, its x^ is not strongly dependent on whether 
DOii is assigned weight 1 or 10; and similarly when 
DOii is assigned weight 10, its x^ is not strongly de- 
pendent on whether CDFn is assigned weight 1 or 10. 
However, in each case there is a small increase in x^ for 
one of the experiments when the weight for the other 
is increased, which suggests a bit of tension between 
them. That is in fact the case, as can be seen clearly 
using a new and more powerful method of analysis that 
is discussed in a separate publication [l9| . 

• The consistency between Run I and Run II measure- 
ments is ambiguous. If the Run II experiments are 
assigned weight 10, then raising the weight for Run I 
data from 1 to 10 improves the fits to Run I as it must, 
while making very little change in the x^ for the Run II 
and nonjet experiments. This suggests that Run I and 



Run II data are rather compatible with one another. 
On the other hand, if instead the Run I experiments 
are assigned weight 10, then raising the weight for Run 
II data from 1 to 10 (which improves the fits to Run II 
dramatically) raises x^ for DOi from 38.6 to 49.7. An 
increase of this magnitude suggests tension between 
Run I and Run II — and, indeed, the Run I and Run 
II experiments prefer somewhat different shapes of the 
gluon PDF, as will be shown in Sec. [T] Yet the statis- 
tical significance of this level of disagreement cannot 
be established firmly, given the abnormally large varia- 
tions in x^ for DOi that are observed for otherwise very 
similar fits. This may be related to the same details of 
the systematic error treatment in DOi that allows x^/N 
to become very small for that experiment. We keep the 
Run I data in our final global fit. The fact that the 
Run I and Run II experiments are at somewhat differ- 
ent Vs values (1.80 TeV vs. 1.96 TeV) might possibly 
supply some useful physics constraint. Also, DOi ex- 
tends to higher rapidity than either of the Run II data 
sets. The effect of this choice will be studied in Sec. [7] 
by examining the effect of instead dropping the Run I 
data. 

Finally, let us address the question of whether the 
Run II jet data can be expected to reduce the PDF un- 
certainty. Table U shows that the fit with weight 1 for 
both Run I experiments and weight for both Run II 
experiments has x^ = 106 and 138 for the two Run II 
experiments. Trying each of the 44 eigenvector uncer- 
tainty sets of CTEQ6.6, we obtain extreme x^ values 
of 119 for CDFii and 140 for DOn. None of these val- 
ues indicate a drastically bad fit, so no great reduction 
in the PDF uncertainty can result from including the 
new jet data. However, some of the values both for 
CDFn and DOn are sufficiently larger than the values 
shown in Table |T1 that we can expect a small reduction 
in the PDF uncertainty as a result of including the 
new data. That reduction in uncertainty is examined 
directly in Sec. 16.11 



5.1. Fits with restricted gluon parametrizations 

As discussed in Sec. SI it is important to use a suf- 
ficiently fiexible parametrization for the input gluon 
distribution. The following studies demonstrate how 
an inadequate parametrization can be exposed by the 
weighting method. 

If we restrict the parametrization ([T|) by setting 
04 = as = 0, we obtain the results shown in Table |TT1 
With that restriction, the fit to data without jets is 
still very good: x^ is higher by only 2 units. But the 
fit to these nonjet data becomes very bad when the jet 
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weights are raised to 10 or more; while for smaller jet 
weights, the fits to the jet experiments are much worse 
than the corresponding fits of Table [H If this simpli- 
fied parametrization had been used, the jet data would 
have mistakenly appeared to be inconsistent with the 
rest of the data. 
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119.3 


10 


51.7 


10 
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10 


76.2 


10 
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204.3 



TABLE II: Fits to jet experiments with various weights, 
using a restricted gluon parametrization. 



A different simplified gluon parametrization 
g{x)=aox''^ [l-xT' (1 + agx) , 



(2) 



which has been used in studies at HERA [20] (at a 
somewhat higher /xq), has even worse behavior, as is 
show in Table Hm For here, for the nonjet data rises 
by 91.4 when the jet data are included at weight 1; and 
that weight is not even large enough to obtain good fits 
to the jet data. It is perhaps not surprising that the 
form ([2]) is inadequate, because the coefficient ao of 
the leading behavior x""^ at a; — *■ and the coefficient 
ao (1 + as) of the leading behavior (1 — x)"^ at x 
1 might have very different magnitudes, since those 
limits are governed by unrelated physics. Hence the 
limiting behaviors might require l + aa to be very large 
or very small, in which case the linear approximation 
1 + a^x provided by Eq. ^ would have to cover a 
large range of variation, for which it might be a worse 
approximation than the exponential form in ([1]). 



6. UNCERTAINTY OF THE GLUON 
DISTRIBUTION: COMPARISON OF 
METHODS 



In this section, we discuss various methods to deter- 
mine the uncertainty of parton distributions. Wc focus 
on the gluon distribution at large x because that is the 
primary aspect of the global analysis that is influenced 
by the jet experiments. Because the uncertainty in the 
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TABLE III: x' for jet experiments with various weights, 
using the restricted gluon parametrization ([JJ. 
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FIG. 3: Gluon uncertainty range by LM method, and some 
of the specific fits that define the limits. 



gluon is large, it serves as a strong test of the methods 
used to estimate uncertainties. 

Within the usual context of our global analysis Q, 
parton distribution shape parameters that minimize 
an effective weighted x^ function define the "best fit" . 
All parton distributions defined by the other choices 
of the parameters are deemed acceptable (and delin- 
eate the region of the PDF uncertainty allowed by the 
analysis) if they produce a value of that exceeds 
the minimum value by no more than a given tolerance 
value Ax^ (i.e., x^ ^ min(x^) + Ax^)- Appropriate 
weights and the tolerance criterion must be chosen to 
ensure that all of the accepted fits provide adequate 
descriptions of every data set. In the present case, we 
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estimate that Ax^ = 100 provides an approximately 
90% confidence limit for all experiments included in 
the fit. 



0.20 




FIG. 4: Gluon uncertainty range by LM method: smaller 
region same as in Fig. [S] larger region = without Run II 
jet data. 



6.1. Lagrange Multiplier method 

The uncertainty of the gluon distribution can be 
found in a straightforward way by the Lagrange Multi- 
plier (LM) method [2l[: at any given value of x, a term 
Xg{x, hq) is added to the function that is minimized 
by varying the fitting parameters. The parameter A is 
adjusted to make the increase in above its minimum 
value equal to A^^. This yields two allowed PDF sets 
(one from positive A and one from negative) that pro- 
vide the minimum and maximum g(x,no)- The pro- 
cedure is carried out at a number of x values to map 
out the extremes of the uncertainty range. 

Results for the gluon uncertainty obtained in this 
way are shown in Fig. [31 together with some of the 
specific curves that produced the envelope of extremes. 
The shapes that provide the extremes do not vio- 
late any strong intuition, although those showing a 
peaked structure in x^ g{x,^Q) at large x might not 
be expected a priori. (Still larger uncertainties might 
be found if more fine structure were allowed by the 
parametrization; but sharp structures in x are not 
physically expected, and their effect would tend to go 
away at higher scales through the smoothing character 
of DGLAP evolution.) 



It is natural to ask if the extensive new jet data from 
Run II reduce the gluon uncertainty. To answer that 
question. Fig. [4] compares the uncertainty range from 
Fig. [3] with the uncertainty range obtained by a similar 
Lagrange Multiplier calculation with the Run II data 
removed from the fit. One sees that the Run II data 
somewhat reduce the gluon uncertainty at large x. 



6.2. Quartic penalties 

A PDF set that deviates from the minimum by 
an amount A^^ = 100 usually provides an acceptable 
fit to all experiments and thus cannot be ruled out as a 
valid possibility within the uncertainty range accord- 
ing to the conservative "hypothesis testing" criterion. 
But if the increase in is not spread widely over 
the ~3000 data points, but rather is concentrated in 
one or two experiments, or in any small subset of the 
data points, it may be an unacceptable fit. This is 
found to happen for some of the extreme gluon distri- 
butions obtained in Sec. 16.11 because only the inclusive 
jet experiments are sensitive to the gluon distribution 
at large x. 

To avoid this problem, we could increase the weight 
for the jet experiments in the total by trial and 
error. But we find it simpler and more effective to 
add a penalty to that is proportional to {x^ /NY 
for each of the jet experiments, in order to force the 
final fit to agree acceptably with each of those experi- 
ments, without introducing much change in the central 
fit. With this change in the definition of the weighted 
X^ that is minimized, we can continue to use our es- 
tablished calculational tools. (An alternative method 
used by MSTW 0] is to abandon a fixed Ax^ and in- 
stead to set the maximum allowed displacement along 
each eigenvector direction independently, by monitor- 
ing the quality of fit to each of the data sets along 
that direction.) The quartic form for the penalty adds 
little to x^ except near the boundary, so it does not 
significantly alter our Ax^ = 100 tolerance estimate. 

These "quartic penalties" are included in all sub- 
sequent fits in this paper. Our final uncertainty for 
the gluon distribution is therefore appreciably smaller 
than what is shown in the preliminary study of Figs. 
OandH 



6.3. Hessian eigenvector method 

In addition to the LM method, the other standard 
technique for estimating PDF uncertainties is the Hes- 
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FIG. 5: Gluon uncertainty by Hessian method, compared 
to extremes at a:: = 0.20, 0.55, 0.80 by LM method. 



sian eigenvector method [2^ . That method works as 
follows. The first derivatives of with respect to 
the fitting parameters are zero at the minimum, so 
in the neighborhood of the minimum, can be ap- 
proximated by Taylor series as a quadratic form in the 
fitting parameters. The coefficients of that quadratic 
form are the Hessian matrix, which is the matrix of 
second derivatives of with respect to the fitting pa- 
rameters. The eigenvectors of the Hessian matrix can 
be used to define eigenvector PDF sets that character- 
ize the allowed uncertainty range. The uncertainty of 
any prediction is calculated by computing the devia- 
tion from the best fit along each eigenvector direction, 
and adding those deviations in quadrature separately 
for the positive and negative deviations. The gluon 
uncertainty calculated this way is shown in Fig. El to- 
gether with extremes calculated by LM &t x — 0.2, 
0.5, and 0.8. The agreement between the two meth- 
ods is seen to be quite good, although a slightly larger 
upper limit is found at x = 0.8 by the LM method, 
which is not subject to the quadratic approximation. 
The eigenvector method is of course much more conve- 
nient to use than LM, because the LM method requires 
tuned fittings of the Lagrange Multiplier parameter for 
every extremum point that is desired. So it is comfort- 
ing to see this agreement. 



6.4. Choice of eigenvectors 

The eigenvectors of the Hessian matrix can be 
thought of as a choice of basis vectors that define new 
fitting parameters zi for which 

N 
i=l 

The choice of these eigenvectors is not unique, because 
the form ([3]) is preserved by any further orthogonal 
transformation of the coordinates {zi\. In the approx- 
imation that is a quadratic function of the shape pa- 
rameters which parametrize PDFs at such a trans- 
formation would not affect the calculation of the un- 
certainty. 

The freedom to make an additional orthogonal 
transformation may offer the possibility to reduce the 
number of eigenvectors that are needed to effectively 
describe the uncertainty of a particular quantity of in- 
terest. One possible way to attempt this is to diago- 
nalize the parameter dependence of that quantity, us- 
ing a procedure that is sketched in the Appendix and 
described explicitly in [l^ . 

An example of this is shown in Fig. [HI which shows 
the gluon uncertainty calculated by the eigenvector 
method, together with the 48 extreme eigenvector sets 
(positive and negative directions along each of the 24 
eigenvectors). In the left panel, the eigenvectors are 
defined in the traditional way as eigenvectors of the 
Hessian. Note that many eigenvectors contribute to 
the uncertainty at each value of x. (A common method 
to make a quick estimate of uncertainty is simply to 
look at the extremes over the eigenvector sets, with- 
out adding the individual contributions in quadrature. 
That can easily underestimate the uncertainty by a 
factor of two or more, as seen here.) 

In the right panel of Fig. [6l the eigenvectors are de- 
fined by choosing G ~ 5(0.55, /io) in Eq. (A3) of the 
Appendix. Note that close to a; = 0.55, almost all of 
the uncertainty comes from just one pair of eigenvector 
sets. In CTEQ6.1, it happened by convenient accident 
that most of the uncertainty in the gluon distribution 
was embodied in a single eigenvector set. By "rediag- 
onalizing" the Hessian matrix, this type of simplicity 
can be gained in other situations; though as seen in 
Fig. [6] it may take more than one eigenvector direction 
to span the important variations. 

A rediagonalization based on the second-derivative 
matrix, such as the one carried out here, is not nec- 
essarily the best way to choose the new eigenvector 
directions, since there is no theorem to guarantee that 
it will result in only a few dominant coefficients. For 
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FIG. 6: Gluon distributions and uncertainties in CT09 (red) and the eigenvector contributions to them. Left: eigenvectors 
by traditional method; Right, eigenvectors by "rediagonahzation" method based on diagonahzing g{jj,o,x) at a: = 0.55. 



example in this particular case it might have worked 
better to ignore the second derivatives, and instead 
to simply choose the first new eigenvector direction 
along the gradient direction for, say g(0.5) in the 24- 
dimensional space; then the second eigenvector could 
be chosen along the gradient direction for, say 5(0.8) 
in the 23-dimensional subspace that is orthogonal to 
the first eigenvector, etc. In any case, the option of 
redefining the eigenvector directions to simplify the 
description of uncertainties in other physics analyses 
shows promise for further study. 



6.5. Random PDF sets 

Another possible way to characterize the uncertain- 
ties would be to generate a random collection of PDF 
sets that lie inside or at the edge of the acceptable 
range < Xmin + ^X^- (In the quadratic approxi- 
mation, this would correspond to a sphere in the N- 
dimensional hyperspace spanned by {^i}-) For exam- 
ple, a set at the edge can be constructed by generating 
a random unit vector in the A^-dimensional parame- 
ter space using the eigenvectors as basis vectors, and 
moving away from the minimum point in that direc- 
tion until has increased by the tolerance Ax^ • The 
envelope of results obtained from 500 PDF sets ob- 
tained this way is shown in Fig. [71 together with 50 of 
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FIG. 7: Gluon uncertainty from 50 random PDF sets with 
Ax^ = 100; envelope of 500 such random sets (dotted); full 
uncertainty range from Hessian method (shaded region). 



the individual results. Also shown is the uncertainty 
obtained by the Hessian method. We see that the en- 
velope of the random sets covers a much smaller range 
than the full uncertainty — even though every one of the 
500 sets is at the upper limit for x^- This is not sur- 
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prising, since the extreme g{x) at any given x can be 
thought of as corresponding to a specific direction in 
the N-dimensional parameter space. The probability 
distribution for the component, z, of a random unit 
vector along any particular direction in N dimensions 
can be shown to be dP/dz oc (1 — 2:2'j(7V-3)/2^ which 
becomes extremely small as z approaches its limit of 1. 
For example, when N = 24, the probability for z > 0.6 
is less than 1 in 1000, so the chance of finding a value 
close to the true extreme of 1.0 by random sampling is 
very small. This conclusion can be understood quali- 
tatively in a simple way: it is unlikely for the direction 
cosine along any particular direction to lie close to its 
maximum of 1, since there are N random direction 
cosines whose sum of squares must add up to 1. 

The point of this exercise is to show that no con- 
veniently small collection of PDFs that are all accept- 
able fits to the data can approximately cover the full 
uncertainty range. It is therefore essential to have a 
well-defined way to combine the uncertainties associ- 
ated with the various fits in such a collection. In the 
Hessian method, this is provided by the rule of adding 
uncertainties from eigenvector sets in quadrature. In 
the case of random PDF sets, it would require estimat- 
ing the uncertainty range for a prediction of a quantity 
X using the dispersion (X^) — {X)"^ in values calcu- 
lated from the random sets. 

The above limitation does not apply to Monte Carlo 
based sampling methods such as NNPDF [l6[, since 
those methods produce a collection of PDF sets that 
directly samples the space of uncertainties. Such a col- 
lection naturally includes some PDF sets that are not 
"acceptable" fits to the input data — e.g., in a collec- 
tion of 100 Monte Carlo sets, one obviously expects to 
find ^10 sets that lie outside of the 90% confidence 
region. In this approach, the PDF uncertainty for a 
quantity is obtained by simply calculating that quan- 
tity for each of the sample PDF sets: the distribution 
of results directly represents the predicted uncertainty 
range. 



7. UNCERTAINTY OF THE GLUON 
DISTRIBUTION: RESULTS 

The CT09 fit discussed in Sees. and O is our 
most up-to-date set of parton distributions. The cen- 
tral gluon fit and its uncertainty are shown in Fig. [8] 
at scales fi = 2 GeV and 100 GeV, compared with the 
previous CTEQ6.6 Q fit. The uncertainty band has 
narrowed somewhat as a result of including the new jet 
data and the quartic penalties — except at extremely 
large a;, where the more flexible gluon parametrization 



in CT09 has broadened the allowed range. There is 
a strong overlap between the old and new uncertainty 
bands, and the central fit has shifted by an amount 
that is within or just at the edge of those bands. At a 
large scale such as /i = 100 GeV, there is rather little 
change between the old and new determinations. 

Fig. [5] shows that the CT09 central fit at small scale 
has a featureless behavior at large x, in contrast to 
the mild "shoulder" structure of CTEQ6.6. (The ap- 
pearance of this shoulder is enhanced by the factor 
that multiplies g{x) in the plot to emphasize the 
large x behavior.) Indeed, MSTW @ remark that 
in fitting the new jet data, they no longer need to 
use their former convoluted method of parametrizing 
the gluon in the DIS scheme and transforming it to 
MS. However, we find that with a properly flexible 
parametrization, some type of shoulder structure is not 
ruled out — indeed, the original CTEQ6.6 central fit 
for the gluon distribution still lies within our allowed 
uncertainty range. In detail, for the jet experi- 
ments (CDFi, DOi, CDFii, DOii) arc (54, 59, 91, 122) 
in CT09; (52, 55, 116, 121) in a fit with the gluon 
shape identical to CTEQ6.6; and (53, 60, 97, 120) in 
a fit using the CTEQ6.6 gluon parametrization with 
the parameters refitted. 

The change between CTEQ6.6 and CT09 in the 
shape of the gluon distribution is a consequence of 
interplay between adding the Run II jet data and in- 
creasing the flexibility of the gluon parametrization. 
This is studied in Fig. O The solid curve and shaded 
region are again CT09 and its uncertainty. The dotted 
curve is CTEQ6.6. The dot-dash curve is the result 
of repeating the CTEQ6.6 fit using the CT09 gluon 
parametrization. Note that this increased freedom for 
the gluon shape enhances the shoulder, and does not 
move the fit closer to CT09. The short-dashed curve is 
the result of including the Run II data, with all other 
details of the fit being the same as in CTEQ6.6: this 
changes the fit about half way to CT09. But with 
the Run II data included, bringing in the more flexi- 
ble gluon parametrization now produces the rest of the 
change to CT09. Finally, the long-dashed curve is a fit 
that is identical to CT09 except for dropping the Run 

I jet data. This answers the question raised earlier re- 
garding the degree of tension between Run I and Run 

II jet data from a practical point of view: we see that 
the effect of the Run I data on the fit is noticeable but 
small compared to the other uncertainties. 

It is instructive to examine the preferences of various 
combinations of the four jet data sets in the fit. This 
is shown in Fig. 1101 The solid curve and shaded re- 
gion are again CT09 in both panels. The other curves 
were obtained by fits with weight 1 for all nonjet ex- 
periments, and weights or 1 for each jet experiment 
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FIG. 9: CT09 and variations (see text). 



as listed in the captions. The four curves in the left 
panel correspond to the first four fits in Tabic [D There 
is a slight difference between the (1,1,1,1) curve and 
CT09, because we have chosen to apply somewhat 
larger weights (1.3,1.3,2.1,2.1) to these experiments in 
CT09. The fit with no input from jet data (0,0,0,0) is 
substantially lower than any of the other fits at large 



X — this is a review of why the first jet data made such 
a strong impact on the gluon determination! The four 
curves in the right panel show the preferences of the 
individual jet experiments. The DOi data shows its fa- 
mous preference for a peak at large x; though Table U 
shows that it can be fit with nearly as good with- 
out the peak. The difference between the CDFn and 
DOii curves is comparable to our error estimate, which 
affirms that our error estimate is not overly conserva- 
tive. 

Figure [TT] explores the consequences of some of the 
choices that were made in producing CT09. The solid 
curve and shaded region are CT09 itself at scale 2 GeV. 
We first change the quark masses rric = 1.3 ^ 1.4 GeV 
and mil = 4.5 — > 4.75 GeV, and change /io = 1-3 
1.4 GeV to maintain fiQ ~ nic. These changes are 
found to have a negligible effect on the gluon distri- 
bution: the change is smaller than the width of the 
line in the figure. 

In our basic fitting procedure we routinely em- 
ploy weight factors to improve the quality of fit to 
certain key experiments. In particular, weights of 1.3 
and 2.1 were applied to the Run I and Run II data 
respectively in CT09, and a further contribution pro- 
portional to ix^/N)'^ was added for these experiments 
as discussed in Sec. 16.21 The dotted curve in Fig. [11] 
shows the effect of setting all of the weight factors to 1 
(including those for the jet experiments) and dropping 
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FIG. 10: Fits with various weights (CDFi, DOi, CDFn, DOii). Left panel: Long dashed dotted = (1,1,1,1), Short dashed 
dotted = (1,1,0,0), Long dashed = (0,0,1,1), Short dashed = (0,0,0,0). Right panel: Long dashed dotted = (1,0,0,0), 
Short dashed doted = (0,1,0,0), Long dashed = (0,0,1,0), Short dashed = (0,0,0,1). 
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quartic penalties have also been dropped. This fit has 
weight 1 for all experiments except for dropping the 
older jet data, and no extra penalties added to x^. 
Some would argue this to be the most natural choice; 
though our belief is that it is preferable to apply some 
emphasis in the global fit to experiments that measure 
an important feature with a relatively small number of 
data points. In any case, the uncertainty band shown 
is seen to do a reasonable job of encompassing the re- 
sults of various plausible choices. If it were made much 
narrower by a smaller A^^ criterion, it would not do 
so. Thus we see that a large part of the uncertainty — 
and the need for the Ax^ ~ 100 criterion — arises from 
differences in plausible choices involved in making the 
global fit, rather than directly from propagating the ex- 
perimental errors given in the data. 



FIG. 11: CT09 and results from some alternative choices 
(see text). 



the quartic penalty. The resulting change is very small. 
(The real purpose of the weights is toward maintain- 
ing acceptable fits to all experiments as we move away 
from the best fit to estimate uncertainties.) 

The short-dash curve in Fig. [Tl] shows the effect of 
dropping the Run I data, keeping the weights at 1, but 
restoring the quartic penalties on Run II jet values. 
Finally, the long-dash curve is similar except that the 



8. COMPARISON WITH MSTW 

We compare our work with recent results from 
MSTW @ in Fig. [H The solid curve and shaded 
region show the central fit and uncertainty range for 
CT09, as in the preceding figures. To make a straight- 
forward comparison, all other curves in Fig. [12] use 
the MSTW values ^^(toz) = 0.12018, = 1.4 GeV, 
mi, = 4.75 GeV. The dotted long dashed curve is a 
fit that is the same as CT09 except for the change in 
Ois{mz) (and the change in quark masses, which has a 
negligible effect). 

The dotted short dashed curve is MSTW2008NLO. 
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FIG. 12: Solid curve and shaded region is CT09 
{as{mz) = 0.118). All other curves are fits with as{mz) = 
0.12018. Dotted short dashed curve is MSTW2008NLO. 
See text for description of the other curves. 



It is surprisingly different from the ag-niodificd CT09, 
though it lies within our estimated 90% confidence re- 
gion. 

To look for the cause of the difference between the 
Qfs-modified CT09 result and MSTW, wc explore a se- 
ries of modifications to the CT09 procedure that make 
it more like that of MSTW. These are the same mod- 
ifications that were discussed in connection with Fig. 
[TT] First we drop the CDFj and DOj data sets. This 
leads to the dotted curve in Fig. [T^l which is closer to 
the MSTW result at large x, but still quite far from 
it. 

The dashed curve in Fig. [12] corresponds to again 
dropping the Run I data sets, while also setting the 
weight factors for all experiments to 1 and dropping 
the quartic penalties on 'x^ jN . This reduces the in- 
fluence of the jet data, and hence results in a fit that 
is closer to no-jets fits, which have a lower gluon at 
large x. This dashed-curve fit is the most similar in its 
approach and result to that of MSTW; but a notice- 
able difference still remains. We can only speculate 
on what might be responsible for this difference, with 
obvious suspects being the different parametrizations 
used, or the neglect of correlated systematic errors for 
DIS data in the MSTW fit. Other possible sources 
for the difference is that there are some differences in 
which data sets are included in the fits, and a difference 
in the kinematical cuts in Q and W that are applied 
to those data sets. Furthermore, there are small dif- 
ferences in the treatment of heavy quarks; and a small 



difference in the definition of as (/z) at NLO, even when 
the values are matched at = mz (see (23l. [2^). 



9. QUALITY OF THE FITS 

The good agreement of the central fits with the Run 
II jet data, when systematic error shifts allowed by the 
published data are included, is shown in Figs. [13] and 
[Hi The unshifted data points are also shown. These 
are quite far from the theory curves: the systematic 
errors are much larger than the statistical ones here, so 
fitting the systematic error parameters is an essential 
part of fitting these data sets. 

There are 24 systematic shifts for CDFn, whose fit- 
ted values come out of order 1 as they should: -0.1, 
-1.0, -0.3, -1.0, 0.7, -0.2, 0.8, -0.7, -0.7, -0.9, 0.1, 0.6, 
1.0, -0.3, -0.3, 0.5, -1.2, 0.4, 0.9, 0.0, -1.3, 0.1, -0.1, 
-0.3. The fitted overall normalization factor is 1.02, 
which is well within the published 6% error. 

There are 22 systematic shifts for DOn (in addition 
to the overall normalization). Some of these come out 
a bit larger, though they are still of order 1: -0.5, -1.6, 
0.0, 0.1, -0.8, -0.5, 0.1, 1.1, -0.4, 1.1, -1.1, -0.4, 0.4, 
-1.6, -0.2, -1.9, 0.5, 0.3, 0.2, 1.7, -1.1, -0.1. We pre- 
sume these shifts to be reasonable, since their overall 
probability is acceptable, and since — as is typical 
of systematic errors — their experimental assessment 
must be partly subjective. For what it is worth, we find 
it absolutely necessary for some of these shift parame- 
ters (most notably, "dsys015: eta-intercalibration fit" ) 
to have magnitude larger than 1.5 in order to achieve 
an acceptable fit to these data within the global fit. 
The fitted overall normalization factor is 0.98, which 
is well within the published error estimate. 



10. CONCLUSION 

We have carefully examined the NLO treatment of 
inclusive jet data and its influence on the determina- 
tion of the gluon distribution in a QCD global analysis. 
Key features of the analysis are the use of sufficiently 
flexible functional forms to reduce parametrization de- 
pendence, and full inclusion of the correlated system- 
atic errors published by the experiments. 

The difference between the new CT09 gluon results 
and our previous CTEQ6.6 analysis Q is shown in Fig. 
[8] At a large scale like /.t = 100 GeV, where the high-x 
gluon PDF is important for many high-profile signal 
and background processes at the Tevatron and LHC, 
the impact of the new jet data is quite small com- 
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FIG. 13: Central fit to CDF Run 11 data. Triangles are the original data points. Squares with error bars include shifts 
due to the systematic errors, whose magnitudes are determined in the fit. 



pared to the remaining uncertainty — as was expected 
from the outset, since the new data agreed fairly well 
with their prediction from CTEQ6.6 and its uncer- 
tainty range. 

At the smah scale /i = 2 GeV, where the constraints 
on the gluon are rather indirect. Fig. [5] shows that the 
change in the central prediction at some values of x 
is close to the 90% confidence limit of the uncertainty 
estimated in CTEQ6.6. This demonstrates that our 
method does not overestimate those uncertainties, is 
spite of its tolerance for a range of that is large by 
ideal statistical standards. 

We have introduced an extension of the familiar Hes- 
sian matrix method [l^l for uncertainty analysis. The 
extension involves making a further orthogonal trans- 
formation of the coordinates, after the transformation 
that diagonalizes the Hessian has been carried out. 
This leaves the Hessian matrix in its convenient di- 
agonal form, while offering the possibility to describe 
the uncertainty on a given quantity using a small num- 
ber of important eigenvector sets. This is illustrated in 



the right-hand side of Fig. [6l where most of the gluon 
uncertainty near x = 0.5 is given by just one or two 
eigenvector pairs. A further application of this exten- 
sion of the Hessian method provides a new and im- 
proved method to study the compatibility of the data 
sets in a global fit. This is described in a separate 
pubhcation [l9l |. 

One value of this paper is to document and illus- 
trate methods that can be used to incorporate new 
data sets into a global analysis. There will be many 
opportunities to apply this in the near future, as data 
from Tevatron Run II and HERA Run II continue to 
arrive, and with data from the LHC on the horizon. 

To conclude with a speculation, it is interesting to 
compare the extracted gluon distribution with the dis- 
tributions for up and down quarks. This comparison 
is shown in Fig. 1151 The quark distributions have 
smaller uncertainties than the gluon — particularly the 
up quark, whose larger electric charge makes it promi- 
nent in the extensive body of neutral-current DIS mea- 
surements. Surprising as it may seem, we observe that 



17 




DO Run-2 
0.8<|y|<1.2 



DO Run-2 
1.2 < |y| < 1.6 



DO Run-2 
1.6< Ivl <2.0 



DO Run-2 
2.0 < |y| < 2.4 



Pj (GeV) 



FIG. 14: Central fit to DO Run II data. Triangles are the original data points. Squares with error bars include shifts due 
to the systematic errors, whose magnitudes are determined in the fit. 




FIG. 15: Gluon (solid), u quark (dashed), and d quark (dotted) distributions at three different scales. 



at a small scale like /i ~ 1.3 GcV, the gluon PDF is 
most likely larger than the down quark distribution 
even at very large x. It may or may not even be 
larger than the up quark distribution — more data will 
be needed to determine that. An important challenge 
for further study would be to sec if perhaps one can ar- 



gue convincingly from models of the nonperturbative 
physics of the proton that "valence-like" gluon alter- 
natives where g{x, ^q) > u{x, ^o) at large x are un- 
physical, in which case the uncertainty in PDFs could 
be significantly reduced. 
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In Memoriam: It has been our pleasure to work 
with, and be inspired by, our late mentor, colleague, 
and friend Wu-Ki Tung. Much of the methodology of 
modern PDF global analysis was his innovation, and 
he remained involved in this work to the end of his life. 
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Appendix: Alternative choices for eigenvectors 

Here we sketch how the eigenvector PDF sets in a 
global QCD analysis can be recalculated to more sim- 
ply represent the uncertainty of a particular physics 
quantity, such as the gluon distribution at large x that 
is studied in this paper. See [l^ for a more detailed 
description. 

The standard Hessian method for error analysis is 
based on a quadratic expansion of in the neighbor- 
hood of the minimum of a global fit. This expansion 
follows from Taylor series: 

i=l j = l ^ * ^ ^ 

(4) 

where there are no first-order terms because the ex- 
pansion is about the minimum, and terms higher than 
second order have been dropped. The {ui} in Eq. (|4]) 
are the parton parameters of the global fit, and quanti- 
ties with superscript (0) are evaluated at the minimum 
of ■ Formally, one can express the displacements 
a^— a^°^ as linear combinations of the normalized eigen- 
vectors of the matrix of second derivatives to obtain a 



diagonal expression 

N 

1=1 

in which the new coordinates {zi} are the coefficients 
that multiply the eigenvectors. Because nonquadratic 
behavior appears at widely different scales in differ- 
ent directions of the parameter space, and because the 
second-derivative matrix must be calculated numeri- 
cally by finite differences, it is necessary in practice to 
compute the linear transformation from coordinates 
{tti — a^} to coordinates {z^} by a series of iterative 
steps jlSj . 

The choice of eigenvectors that define the transfor- 
mation to the diagonal form ([5]) is not unique, because 
any further orthogonal transformation of the param- 
eters {zi} will preserve that form. This freedom to 
make a further orthogonal transformation can be used 
to simultaneously diagonalize any one additional func- 
tion of the coordinates within the quadratic approxi- 
mation. Specifically, if G is a function of the original 
coordinates, one can choose the new coordinates such 
that 

N 

G = Go + Y.{P^ + Q, z.^) . (6) 
1=1 

while maintaining ([5]). This form ([6]), which is ac- 
curate through second order in the {z^}, is obtained 
by the following recipe: (1) Calculate the symmetric 
matrix {d^G/dzi dzj)o using the "old" {zi} by finite 
differences; (2) Express these "old" {zi} as linear com- 
binations of the eigenvectors of that matrix; (3) The 
coefficients of these linear combinations become the 
desired "new" {zi}. These steps are iterated a few 
times to refine the transformation. This procedure is 
described explicitly in [l9l |. 

In the iterative procedure used in our previous un- 
certainty analyses [3, H, [12] , the quantity G defining 
the transformation ^ was the overall length-squared 
of the displacement from the minimum in the space of 

the original shape parameters: X)i3=i('^i ~ ^1*^^)^- To 
study the uncertainties of g{x, fi) at large x, we can in- 
stead choose G to be a Mellin moment of some PDF, 
such as Jp x"g{x, fio) dx with 2 ^ n ^ 5; or we can 
simply choose G = g{x,^Q), e.g., at a; = 0.55 as was 
done to create the right hand side of Fig. [G] To facili- 
tate the study of some interesting physical quantities, 
one might want to choose G to be, say, the cross section 
for W , Z, or Higgs boson production. Another choice, 
which is useful for exploring the internal consistency 
of a global fit, is to define G as the contribution to 
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from a particular subset of the data. This apphcation is the subject of [l£ 
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